System and method for the monitoring of a measurement and control device

ABSTRACT

Described is a system and a method to monitor measuring and control equipment. The occurrence of a malfunction does not immediately lead to the monitoring system entering a secure state, but rather increases the count of a counter. If the count exceeds a certain value, then the monitoring system enters a secure state.

FIELD OF THE INVENTION

[0001] The present invention relates to a system for monitoringequipment for measuring, controlling, and regulating and a correspondingmethod.

BACKGROUND INFORMATION

[0002] Known monitoring systems for measuring and control equipmentallow for the system to enter a so-called secure state in response to amalfunction occurring. The secure state either causes the currentoperating state of the measuring and control equipment to change orprevents the operating state from being changed at a later time. One canthen provide for, e.g. the measuring and control equipment, the systemcontrolled by the measuring and control equipment, or the measuring andcontrol equipment and the controlled system, being switched off inresponse to the occurrence of a malfunction.

[0003] A system for controlling and/or regulating an internal combustionengine is known from German Published Patent Application No. 40 04 083.This includes several sensors, which generate signals that represent theoperating parameters of the internal combustion engine. Malfunctiondetection is carried out, using these signals. The malfunctionmonitoring occurs within predefined sub-ranges having a lowersensitivity than outside of these predefined sub-ranges. If amalfunction is detected, then it can initially be checked if this can beattributed to impaired or incomplete signal transmission. The system isonly switched off, when this is not the case. In this manner, the systemis prevented from switching off in response to a malfunction of one ofthe sensors.

[0004] A disadvantage of the described system is that the system isimmediately switched off in response to the occurrence of certainmalfunctions. This indeed means that the safety of operation is high,but also that the availability is insufficient.

SUMMARY OF THE INVENTION

[0005] Therefore, an object of the present invention is to propose adevice and a method for monitoring measuring and control equipment,which ensure both ample operational safety and satisfactoryavailability.

[0006] The system of the present invention to monitor equipment formeasuring, controlling, and regulating has a monitoring device, whichmonitors the method of functioning of the measuring and controlequipment. In this context, malfunctions of the monitored measuring andcontrol equipment are detected. In addition, the monitoring device maycontrol the operating state of the equipment. The system distinguishesitself in that a counter having a count is provided, the detection of amalfunction increases the count, and the operating state of themeasuring and control equipment is controlled as a function of thecount.

[0007] A malfunction is indeed detected, but does not necessarily ordirectly result in the operating state of the measuring and controlequipment being controlled, i.e. result in the measuring and controlequipment possibly being switched off. Initially, the occurrence of amalfunction only leads to the count of the counter being increased. Thisonly results in disconnection, when the count reaches a certain,predetermined value. This value is variable and represents the reactionthreshold of the monitoring device. By selecting the reaction threshold,the user has the option of setting his system for monitoring with regardto operational safety and availability, in accordance with hisrequirements.

[0008] The monitoring device preferably monitors the method offunctioning of the measuring and control equipment, using communicationsoperations carried out at regular time intervals. Each communicationsoperation, which includes an exchange of data between the monitoringdevice and the measuring and control equipment, yields either amalfunction or correct functioning. Therefore, the reaction time of themonitoring system may also be determined by the choice of intervalsbetween the communications operations.

[0009] In a preferred, specific embodiment, the detection of correctfunctioning reduces the count of the counter. This prevents sporadicallyoccurring malfunctions from resulting in the measuring and controlequipment being switched off, since detected instances of correctfunctioning reduce the count again and again.

[0010] It is advantageous, when the count is to be controlledindependently of the occurrence of malfunctions. This makes sense whenthe predetermined reaction threshold appears to be too high in someoperating states. For example, the measuring and control system may keepthe counter of the monitoring device just below the reaction threshold,using deliberate, false information. This holding of the counter ismaintained for the duration of the special operating state that iscritical with regard to safety. Consequently, the short reaction timefrom the occurrence of a fault to the reaction of the monitoring deviceprovides the monitoring system with the maximum possible safety.

[0011] According to a particularly preferred specific embodiment of thesystem of the present invention, the count of the at least one counteris compared to a threshold value, a reset or a fault reaction beingtriggered in response to the threshold value being reached or exceeded.In practice, the monitoring of such a threshold value turns out to besimple and reliable.

[0012] A second counter level is advantageously defined below thethreshold value, the count not being allowed to fall below the secondcounter level, and an artificially generated malfunction being inputinto the system in response to the second counter level being reached.

[0013] In this connection, it is conceivable for the reaction thresholdor the threshold value to be adjustable or variable. This measure makesit possible to adjust to specific operating states.

[0014] The variation of this second counter level also allows thedesired availability or reaction time of the system to be flexiblyadjusted.

[0015] Therefore, depending on the situation, one may also selectbetween maximum safety and maximum availability during continuousoperation, with an arbitrary number of graduations.

[0016] According to a particularly preferred embodiment of the system ofthe present invention, a first fault counter assigned to the monitoringdevice and a second fault counter assigned to the equipment to bemonitored are provided, which may be periodically checked and/orcompared to each other in order to monitor the system. This measureallows the function of the equipment to be monitored to be checked,using the first counter, and allows the function of the monitoringdevice to be checked, using the second fault counter. A periodiccomparison of the counts of these two fault counters also allowsso-called sporadic faults to be detected in a simple manner, as will beexplained later in the specification.

[0017] In this context, it is advantageous that the first fault countermay be used for counting an image of the second fault counter's count.Therefore, the so-called expected value of the second fault counter maybe stored, using the first fault counter.

[0018] A third fault counter, which is used to compare the counts of thefirst and second fault counters, is advantageously provided.

[0019] The method of the present invention provides for a counter beingused whose count is increased in response to detecting malfunctions, andfor the control of the operating state of the monitored equipment beingcarried out as a function of the count.

[0020] The method of functioning of the measuring and control equipmentis preferably executed, using communications operations performed inregular intervals. Each communications operation reveals either amalfunction or correct functioning.

[0021] It is advantageous, when correct functioning is registered by areduction in the count of the counter. This ensures that sporadicallyoccurring malfunctions also do not result in the operating state of themeasuring and control equipment being influenced.

[0022] The count may advantageously be controlled independently of theoccurrence of malfunctions. Thus, the reaction time of the monitoringsystem may be adapted to current requirements during continuousoperation.

BRIEF DESCRIPTION OF THE DRAWINGS

[0023]FIG. 1 shows a schematic representation of a preferred specificembodiment of the monitoring system according to the present invention.

[0024]FIG. 2 shows a diagram for explaining a preferred specificembodiment of the method according to the present invention.

[0025]FIG. 3 shows a representation corresponding to FIG. 1, of afurther preferred embodiment of the monitoring system according to thepresent invention, the (schematic) representation of the internalcombustion engine being dispensed with in this case.

[0026]FIG. 4 is a first diagram for explaining a possible time sequenceof the method according to the present invention.

[0027]FIG. 5 is a second diagram for explaining a possible time sequenceof the method according to the present invention.

DETAILED DESCRIPTION

[0028]FIG. 1 shows, in a schematic representation, a preferred specificembodiment of the monitoring system according to the present invention,in use. Shown is an internal combustion engine 1, equipment 2 formeasuring, controlling, and regulating (measuring and controlequipment), a monitoring device 3, and a counter 4.

[0029] Internal combustion engine 1 is controlled with the aid ofmeasuring and control equipment 2. Measuring and control equipment 2 isin turn monitored by monitoring device 3. This monitoring isaccomplished by communications operations between monitoring device 3and measuring and control equipment 2. If a malfunction is detected, thecount of counter 4 is increased. If correct functioning is registered,then the count is reduced. As soon as the count reaches a certain value,monitoring device 3 assumes a secure state. This results in measuringand control equipment 2, and possibly internal combustion engine 1 aswell, being switched off.

[0030] In a diagram, FIG. 2 explains the execution of a preferredspecific embodiment of the method according to the present invention.The Roman numerals in the drawing represent the count.

[0031] The count is I in state 5. After a certain time span, acommunications operation occurs between monitoring device 3 andmeasuring and control equipment 2. If a malfunction is meanwhiledetected, then the count is increased to II, as is represented in state6. A further communications operation occurs again, after a certain timespan. If a malfunction is detected, then the count is increased to III,which corresponds to state 7. The count is otherwise reduced to I, whichcorresponds to state 5. If the count is III, in accordance with state 7,then the detection of correct functioning causes the count to be reducedto II, state 6. If a malfunction is detected in state 7, then the countis increased to IV, which corresponds to state 8. A communicationsoperation is repeated in state 8. If this detects correct functioning,then the count decreases to III, i.e. state 7. If a malfunction isdetected in state 8, then the count is increased to V, which correspondsto state 9. This count causes monitoring system 3 to assume the securestate. As a result, measuring and control equipment 2 and internalcombustion engine 1 are switched off. Therefore, a count of V representsthe reaction threshold of the monitoring system for the exemplaryembodiment shown.

[0032] According to a further preferred embodiment, the system andmethod of the present invention may be implemented, using a number ofcooperating fault counters. This is described below in light of afunction-computer monitoring module, using three fault counters: A firstfault counter 4 is provided in monitoring module 3 of measuring andcontrol equipment (function computer) 2. A second fault counter 14,which is a copy of fault counter 4, is provided in measuring and controlequipment 2. The task of fault counter 4 is to count incorrect responsesof measuring and control equipment 2. Fault counter 14 in measuring andcontrol equipment 2 is used to store the expected value of fault counter4. A further fault counter 24, which counts inconsistencies betweencounters 4 and 14, is advantageously provided in the measuring andcontrol equipment.

[0033] The following strategy is, for example, applicable to thecounters: For example, it is assumed that the operating state ofmeasuring and control equipment 2 is controlled in response to the countvalue of fault counter 4 reaching 13. In the following, this is assumedto be a reset. One starts, for example, with a beginning count of 11, inorder to prevent a defective measuring and control device from beingactivated after initialization. If a correct response, e.g. frommeasuring and control equipment 2, reaches fault counter 4, its count isreduced by 1 (this always occurs in the case of a correct response, ifthe count is greater than 0). If an incorrect response is detected, thenthree fault points are added. In the case in which a count greater thanor equal to 13 is reached, a reset of the measuring and controlequipment is triggered.

[0034] To check if monitoring module 3 is functioning correctly,measuring and control equipment 2 purposely sprinkles incorrectresponses in at an appropriate count of fault counter 4, in order tocheck if, and to what extent, monitoring module 3 detects incorrectresponses and its fault counter 4 accordingly counts these responsescorrectly. Since, for example, the system only allows the measuringcontrol equipment to detect the current count of counter 4 every 32ndinquiry-response communication (communications frame), fault counter 14in the measuring and control equipment is used internally in themeasuring control equipment to count a representation of fault counter4. Therefore, fault counter 14 contains the so-called expected value offault counter 4. If monitoring module 3 signals the count of its faultcounter 4 in place of the 32nd inquiry in the cycle, then the measuringand control equipment compares the expected value, i.e. the count valueof fault counter 14, to the signaled value, i.e. the count value offault counter 4. If these two count values do not agree, then thirdfault counter 24 is increased by three points. If there is agreement,then the count value of fault counter 24 is decreased by 1.

[0035] Fault-tolerance times must continually be taken intoconsideration in systems for monitoring measuring and control equipment.In the exemplary embodiment described here, the monitoring plan ishierarchically constructed in three levels, the first level being formedby measuring and control equipment 2, which is monitored by the secondlevel, an internal software check test not represented in detail. Thethird level, which is essentially monitored by monitoring module 3, isused to monitor the second level, i.e. the hardware, which is used tocarry out the software monitoring.

[0036] If, according to a first case constellation, a fault occurs onthe first level, i.e. in the measuring and control device, then thetolerance time is a function of the reaction speed of the second level,i.e. of the internal software monitoring, which advantageously hasdirect access to the output stages of the measuring and controlequipment.

[0037] Such an access path via a computer pin typically carries the nameof “PEN” (=Power ENable) and switches, for example, the actuator systemof a connected motor to high resistance.

[0038] An example of another case is the occurrence of a fault in thecomputer hardware (measuring and control hardware), which means that thefault has to be detected via the third level.

[0039] A hardware fault results in an incorrect response of themeasuring and control equipment. In this case, monitoring module 3detects the incorrect response and repeats, for example, the inquirythat was responded to incorrectly, until the response is correct. If, inthis connection, fault counter 4 exceeds its reaction threshold beforethe inquiry is responded to correctly, then monitoring module 3 triggersa reset of measuring and control equipment 2. The fault-tolerance timenow depends on how many false responses must be received in order forfault counter 4 to exceed the reaction threshold. When the fault counterhas a count of 0, then, for example, five incorrect responses must bereceived in succession, in order to exceed the threshold of 13 selectedfor purposes of illustration. In the case in which each inquiry-responsecommunication typically lasts 40 ms, the result here is amonitoring-module reaction time of approximately 200 ms.

[0040] Since a representation of fault counter 4 is logged in themeasuring and control equipment, using fault counter 14, fault counter 4may be influenced by deliberate, incorrect responses, in order to keepit closer to the reaction threshold. However, this brings an unknownvariable to the forefront, namely the occurrence of so-called “sporadicfaults”. These are faults, which occur randomly due to effects that aremostly external, and are unpredictable. The monitoring module detects anincorrect response and advances its fault counter 4. Of course, thesefaults may not be logged in the expected value of counter 14, since themeasuring and control equipment assumes that the response wastransmitted correctly. These discrepancies are discovered when faultcounter 4 signals back in place of every 32nd inquiry, and they resultin an increase in the count of counter 24.

[0041] Rare faults that occur sporadically should not lead to a reset ofthe system, when this adversely affects the user. Of course, thiscondition limits the possibilities of decreasing the fault-tolerancetime, using the “level control” of the fault counter in the measuringand control equipment. However, frequent, sporadic faults should notlead to a reset, EMC-contaminated, high voltage lines being named hereas an example, and these high voltage lines not being able to ensuresafe operation.

[0042] The stipulation, that a rare, sporadic fault should not result inan immediate reset of the system, is explained by way of example: Thismeans that counter 4 is allowed to reach a maximum count of 10, in spiteof the incorrect responses that are sprinkled in: Faultcounter=10→correct response→fault counter=9→correct response→faultcounter=8→correct response→fault counter=7→deliberately incorrectresponse→fault counter=10 → . . .

[0043] The occurrence of a sporadic fault increases the count of counter4 by three points, i.e. this would result in a count of 13. In the caseof a fault count of 7, the maximum time leading up to a reaction is theduration of three incorrect responses, i.e. 3×40 ms=120 ms.

[0044] Since the counter 14 in the measuring and control equipment mayonly be adjusted to the true count of the counter 4 in monitoring module3 after every 32nd communications frame, only a sporadic fault may occurwithin this time, since this uses up the reserve for this time frame.Therefore, sporadic faults may only occur at a minimum interval of 31frames=31×40 ms=1.24 s. Otherwise, they trigger an (unwanted) reset. Iftwo sporadic faults are permissible within a time of 1.24 s, the maximumtolerance time that occurs increases to 160 ms (admissibility of anadditional incorrect response). To assess the frequency of sporadicfaults occurring, it is necessary to conduct trials in the real system.

[0045] In order to reduce the risk of a reset due to sporadic faults,the “level control” of the count of counter 4 may be implemented as afunction of the driving situation. The manner, in which the “counterlevel” is controlled most effectively, depends on various boundaryconditions (required tolerance time, required fault sensitivity, etc.)and must be tested in the real system, as well.

[0046] It should be pointed out that, in monitoring module 3, the RAMtest may be designed as a writability test, so that a so-called“sleeping fault” may be formed. If a bit inverter produces too low avalue in fault counter 4, then the strategy of “level control” may fail.

[0047] In a third case, the communication may break down for unknown orarbitrary reasons, so that monitoring module 3 detects the responseafter, e.g. a 10.51 ms timeout, switches off the output stages of themeasuring and control device, and triggers a reset. In the worst case,even the time for posing an inquiry, e.g. 100 ms, must be included, sothat in the worst case, one must expect a delay time of 20.51 ms.

[0048] The method according to the present invention is explained oncemore by way of example, using the graphs of FIGS. 4 and 5.

[0049] In these graphs, the x axis represents the time (subdivided intoindividual cycles), and the y axis represents the count of counter 4.

[0050] Drawn into FIG. 4 are 3 special counter readings, which will beexplained in detail. Count 13 is a threshold value, which may not beexceeded. In the case in which this threshold value is exceeded, theresult is a reset or a fault reaction of the system or the count. Acounter level A is drawn in at count 7, and a counter level B is drawnin at count 1. This should make clear that, according to a preferredspecific embodiment of the method of the present invention, a secondcounter level located below the threshold value is variable. Accordingto the specific embodiment represented in FIG. 4, counter level B (count1) is active, i.e. the count may decrease to a value of 1, before anincorrect response that is artificially sprinkled in increases the countby a value of 3 (see arrow P). In the case of a threshold value of 13and a possible, lower count of 1, it is apparent that up to 4 faults maybe tolerated without a fault reaction occurring or the system resetting.When these parameters are set, the system has a high availability andhigh tolerance, and at the same time, a relatively long reaction time.In the exemplary embodiment of FIG. 4, the typical reaction time R is 4cycles. For example, 4 fault occurrences are represented by high-voltageflashes, whose occurrence at a time t_(F) results in a reset (notshown), since the threshold value is exceeded at point y(t_(F)).

[0051] Using FIG. 5, it is now explained how a shorter reaction time maybe attained.

[0052] According to the specific embodiment of FIG. 5, it can be seenthat the counter level A having a count of 7 is active. In other words,a decrease in the count below the value of 7 is not permitted.Consequently, this specific embodiment typically tolerates just onefault, before a fault reaction results from threshold value 13 beingreached. In this case, reaction time R is only two cycles. For purposesof illustration, high-voltage flashes and points t_(F) and y(t_(F)) areonce again shown.

[0053] Finally, it should be pointed out that it would also possible tomake the threshold value varying the lower counter level.

What is claimed is:
 1. A system to monitor equipment (2) for measuring,controlling, and regulating, the system having a monitoring device,which monitors the method of functioning of the equipment (2), and in sodoing, detects malfunctions of the equipment (2) and may control theoperating state of the equipment (2), at least one counter (4, 14, 24)being provided and the detection of a malfunction increasing the countof the at least one counter, and a first count of the at least onecounter (4, 14, 24) being provided, where a reset and/or a faultreaction may be triggered in response to the first count being reachedor exceeded, wherein the difference between the count and the firstcount is reduced by an artificially generated malfunction, which causesan increase in the count and is input into the system.
 2. The system asrecited in claim 1, wherein the monitoring device (3) checks the methodof functioning of the equipment (2), using communications operationscarried out between the monitoring device (3) and the equipment (2) inregular intervals, and each communications operation reveals amalfunction or correct functioning.
 3. The system as recited in claim 1or 2, wherein the detection of correct functioning causes the count ofthe counter (4) to decrease.
 4. The system as recited in one of claims 1through 3, wherein the count is to be controlled independently of theoccurrence of malfunctions.
 5. The system as recited in claim 1, whereinthe first count may be compared to a threshold value, and a secondcounter level lying below the threshold value may be defined, the countof the at least one counter (4, 14, 24) not being allowed to fall belowthe second counter level, and the artificially generated malfunctionthat increases the count being input into the system in response to thesecond counter level being reached.
 6. The system as recited in claim 5,wherein the second counter level and/or the threshold value is variable.7. The system as recited in one of the preceding claims, characterizedby a first fault counter (4) assigned to the monitoring device (3) and asecond fault counter (14) assigned to the equipment (2) to be monitored,the first and second fault counters being able to be periodicallychecked and/or compared to each other, in order to monitor the system.8. The system as recited in claim 7, wherein the first fault counter(14) may be used to count a representation of the count of the secondfault counter (4).
 9. The system as recited in claim 8, characterized bya third fault counter (24), which is used to compare the counts of thefirst and second fault counters (4, 14).
 10. A method to monitorequipment for measuring, controlling, and regulating, where the methodof functioning of the monitored equipment is checked by a monitoringdevice, malfunctions that occur are detected, and the operating state ofthe monitored equipment is controllable, at least one counter (4, 14,24) being used, whose count is increased in response to malfunctionsbeing detected, and a first count being provided, where a reset and/or afault reaction is triggered in response to the first count being reachedor exceeded, wherein the difference between the count and the firstcount is reduced by an artificially generated malfunction, which causesan increase in the count.
 11. The method as recited in claim 10, whereinthe method of functioning of the equipment (2) for measuring,controlling, and regulating is implemented, using communicationsoperations carried out in regular intervals between the monitoringdevice (3) and the equipment (2), each communications operationsignaling a malfunction or correct functioning.
 12. The method accordingto one of claims 10 or 11, wherein correct functioning is registered byreducing the count of the counter (4, 14, 24).
 13. The method as recitedin one of claims 10 through 12, wherein the count is controllableindependently of the occurrence of malfunctions.
 14. The method asrecited in claim 10, wherein the count of the least one counter (4, 14,24) is compared to a threshold value, and a second counter level lyingbelow the threshold value is defined, the count of the at least onecounter (4, 14, 24) not being allowed to fall below the second counterlevel, and an artificially generated malfunction that increases thecount being generated in response to the second counter level beingreached.
 15. The method as recited in claim 14, wherein the secondcounter level and/or the threshold value is varied in order to set adesired availability or reaction time.